A Level-wise Hierarchical Document Clustering method for Categorization

نویسندگان

  • Kil Hong Joo
  • Nam Hun Park
چکیده

For document categorization, numerous words appearing in similar documents are divided into stopwords and keywords and to precisely describe documentary characteristics, documents are expressed by keywords without stopwords. For enhanced clustering precision, this paper proposed SHODC algorithm, a seed cluster-based hierarchical document clustering method, and DHODC method through domain stopwrod removal and tree structure expansion for document categorization. Through several experiments, it was found that the deeper the domain levels, the more precise results were produced by the suggested method compared to other algorithm. The suggested algorithm.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

خوشه‌بندی اسناد مبتنی بر آنتولوژی و رویکرد فازی

Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...

متن کامل

Hierarchical Fuzzy Clustering Semantics (HFCS) in Web Document for Discovering Latent Semantics

This paper discusses about the future of the World Wide Web development, called Semantic Web. Undoubtedly, Web service is one of the most important services on the Internet, which has had the greatest impact on the generalization of the Internet in human societies. Internet penetration has been an effective factor in growth of the volume of information on the Web. The massive growth of informat...

متن کامل

Web Documents Categorization using Fuzzy Representation and HAC

Most of the existing techniques for characterization of Web documents are based on term-frequent), analysis. In such models, given a set of documents, the characterization of each document is represented by a feature vector in a vector space. Howevel; as Web documents written in HTML are semi-structured documents by means of tags, the traditional techniques that assign term weights only by the ...

متن کامل

Hierarchical Bayesian Clustering for Automatic Text Classification

Text classification, the grouping of texts into several clusters, has been used as a means of improving both the efficiency and the effectiveDess of text retrieval/categorization In this paper we propose a hierarchical clustering algor i thm that constructs a Bet of clusters having the maximum Bayesian posterior probability, the probability that the given texts are classified into clusters We c...

متن کامل

Group-wise registration of large image dataset by hierarchical clustering and alignment

Group-wise registration has been proposed recently for consistent registration of all images in the same dataset. Since all images need to be registered simultaneously with lots of deformation parameters to be optimized, the number of images that the current group-wise registration methods can handle is limited due to the capability of CPU and physical memory in a general computer. To overcome ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015